Scaling Bayesian Network Parameter Learning with Expectation Maximization using MapReduce
نویسندگان
چکیده
Bayesian network (BN) parameter learning from incomplete data can be a computationally expensive task for incomplete data. Applying the EM algorithm to learn BN parameters is unfortunately susceptible to local optima and prone to premature convergence. We develop and experiment with two methods for improving EM parameter learning by using MapReduce: Age-Layered Expectation Maximization (ALEM) and Multiple Expectation Maximization (MEM). Leveraging MapReduce for distributed machine learning, these algorithms (i) operate on a (potentially large) population of BNs and (ii) partition the data set as is traditionally done with MapReduce machine learning. For example, we achieved gains using the Hadoop implementation of MapReduce in both parameter quality (likelihood) and number of iterations (runtime) using distributed ALEM on for the BN Asia over 20,000 MEM and ALEM trials.
منابع مشابه
MapReduce for Bayesian Network Parameter Learning using the EM Algorithm
This work applies the distributed computing framework MapReduce to Bayesian network parameter learning from incomplete data. We formulate the classical Expectation Maximization (EM) algorithm within the MapReduce framework. Analytically and experimentally we analyze the speed-up that can be obtained by means of MapReduce. We present details of the MapReduce formulation of EM, report speed-ups v...
متن کاملBayesian Network Parameter Learning using EM with Parameter Sharing
This paper explores the e↵ects of parameter sharing on Bayesian network (BN) parameter learning when there is incomplete data. Using the Expectation Maximization (EM) algorithm, we investigate how varying degrees of parameter sharing, varying number of hidden nodes, and di↵erent dataset sizes impact EM performance. The specific metrics of EM performance examined are: likelihood, error, and the ...
متن کاملLarge-Scale Online Expectation Maximization with Spark Streaming
Many “Big Data” applications in Machine Learning (ML) need to react quickly to large streams of incoming data. The standard paradigm nowadays is to run ML algorithms on frameworks designed for batch operations, such as MapReduce or Hadoop. By design, these frameworks are not a good match for low-latency applications. This is why we explore using a new, recently proposed model for large-scale st...
متن کاملA Genetic Algorithm for Learning Parameters in Bayesian Networks using Expectation Maximization
Expectation maximization (EM) is a popular algorithm for parameter estimation in situations with incomplete data. The EM algorithm has, despite its popularity, the disadvantage of often converging to local but non-global optima. Several techniques have been proposed to address this problem, for example initializing EM from multiple random starting points and then selecting the run with the high...
متن کاملNavigating the parameter space of Bayesian Knowledge Tracing models: Visualizations of the convergence of the Expectation Maximization algorithm
Bayesian Knowledge Tracing (KT) models are employed by the cognitive tutors in order to determine student knowledge based on four parameters: learn rate, prior, guess and slip. A commonly used algorithm for learning these parameter values from data is the Expectation Maximization (EM) algorithm. Past work, however, has suggested that with four free parameters the standard KT model is prone to c...
متن کامل